Sentence Compression for the LSA-based Summarizer

نویسندگان

  • Josef Steinberger
  • Karel Jezek
چکیده

We present a simple sentence compression approach for our summarizer based on latent semantic analysis (LSA). The summarization method assesses each sentence by an LSA score. The compression algorithm removes unimportant clauses from a full sentence. Firstly, a sentence is divided into clauses by Charniak parser, then compression candidates are generated and finally, the best candidate is selected to represent the sentence. The candidates gain an importance score which is directly proportional to its LSA score and indirectly to its length. We evaluated the approach in two ways. By intrinsic evaluation we found that the compressions produced by our algorithm are better than baseline ones but still worse than what humans can make. Then we compared the resulting summaries with human abstracts by a standard n-gram based ROUGE measure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text summarization using a trainable summarizer and latent semantic analysis

This paper proposes two approaches to address text summarization: modified corpus-based approach (MCBA) and LSA-based T.R.M. approach (LSA+T.R.M.). The first is a trainable summarizer, which takes into account several features, including position, positive keyword, negative keyword, centrality, and the resemblance to the title, to generate summaries. Two new ideas are exploited: (1) sentence po...

متن کامل

SUTLER: Update Summarizer Based on Latent Topics

This paper deals with our past and recent research in text summarization. We went from single-document summarization through multidocument summarization to update summarization. We describe the development of our summarizer which is based on latent semantic analysis (LSA). The classical LSA-based summarization model was improved by Iterative Residual Rescaling. We propose the update summarizati...

متن کامل

Two uses of anaphora resolution in summarization

We propose a new method for using anaphoric information in Latent Semantic Analysis (lsa), and discuss its application to develop an lsa-based summarizer which achieves a significantly better performance than a system not using anaphoric information, and a better performance by the rouge measure than all but one of the single-document summarizers participating in duc-2002. Anaphoric information...

متن کامل

A BE-based Multi-document Summarizer with Sentence Compression

This paper describes a multi-document summarizer based on basic elements (BE), head-modifier-relation representation of document content developed at ISI. To increase the coverage of automatically created summaries at a given length, we first generate a summary about twice of the intended length, then apply compression techniques to make sure the resulting summaries fall within the length const...

متن کامل

A Comparison of Feature and Semantic-Based Summarization Algorithms for Turkish

In this paper we analyze the performances of a feature-based and two semantic-based text summarization algorithms on a new Turkish corpus. The feature-based algorithm uses the statistical analysis of paragraphs, sentences, words and formal clues found in documents, whereas the two semanticbased algorithms employ Latent Semantic Analysis (LSA) approach which enables the selection of the most imp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006